Corpus-based Learning for Information Extraction

نویسنده

Thierry POIBEAU

چکیده

This paper presents an integrated framework to extract generic information from multidomain texts. It shows that the AFP newswire exhibits some regularities that can be processed by a collection of wrappers. It also presents a set of linguistic resources able to extract generic information from texts. Lastly, the paper presents a collection of machine learning techniques allowing to extract some more specific information from a document repository. The result of the analysis is the creation of a generic event-based template per document, reachable via hypertextual and graphical interfaces.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Yap, Willy and Timothy Baldwin (2009) Experiments on Pattern-based Relation Learning, in Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China

Relation extraction is the task of extracting semantic relations— such as synonymy or hypernymy—between word pairs from corpus data. Past work in relation extraction has concentrated on manually creating templates to use in directly extracting word pairs for a given semantic relation from corpus text. Recently, there has been a move towards using machine learning to automatically learn these pa...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

WI&CRF: روش پیشنهادی برای استخراج اطلاعات مورد نیاز از متون نظامی

Military Information Extraction techniques are interested for military managers and commanders. But usual information extraction techniques cannot be used for that domain, because military corpus has special structure that differs from non-military corpus. In this paper the military documents structure is compared with non-military documents structure. Moreover a new classification is proposed ...

متن کامل

Learning Patterns for Information Extraction from Free Text

We describe a general approach to the task of information extraction from free text and propose methods for learning syntax patterns automatically from annotated corpora. We study the application of our approach to the extraction of protein-protein interactions from scientific texts. Based on this evaluation, we find that learning patterns outperforms techniques based on handcrafted patterns.

متن کامل

A Methodology for Semantically Annotating a Corpus Using a Domain Ontology and Machine Learning

In this paper we present a methodology for the semantic annotation of domain-specific corpora. This method relies on a domain ontology used initially for identifying and annotating domainspecific instances within the corpus. A machine learning-based information extraction system is then trained on the annotated corpus. The final result of this process is a model which is used to annotate new co...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Corpus-based Learning for Information Extraction

نویسنده

چکیده

منابع مشابه

Corpus based coreference resolution for Farsi text

Yap, Willy and Timothy Baldwin (2009) Experiments on Pattern-based Relation Learning, in Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

WI&CRF: روش پیشنهادی برای استخراج اطلاعات مورد نیاز از متون نظامی

Learning Patterns for Information Extraction from Free Text

A Methodology for Semantically Annotating a Corpus Using a Domain Ontology and Machine Learning

عنوان ژورنال:

اشتراک گذاری